14 research outputs found

    Deep Reinforcement Learning Attitude Control of Fixed-Wing UAVs Using Proximal Policy Optimization

    Full text link
    Contemporary autopilot systems for unmanned aerial vehicles (UAVs) are far more limited in their flight envelope as compared to experienced human pilots, thereby restricting the conditions UAVs can operate in and the types of missions they can accomplish autonomously. This paper proposes a deep reinforcement learning (DRL) controller to handle the nonlinear attitude control problem, enabling extended flight envelopes for fixed-wing UAVs. A proof-of-concept controller using the proximal policy optimization (PPO) algorithm is developed, and is shown to be capable of stabilizing a fixed-wing UAV from a large set of initial conditions to reference roll, pitch and airspeed values. The training process is outlined and key factors for its progression rate are considered, with the most important factor found to be limiting the number of variables in the observation vector, and including values for several previous time steps for these variables. The trained reinforcement learning (RL) controller is compared to a proportional-integral-derivative (PID) controller, and is found to converge in more cases than the PID controller, with comparable performance. Furthermore, the RL controller is shown to generalize well to unseen disturbances in the form of wind and turbulence, even in severe disturbance conditions.Comment: 11 pages, 3 figures, 2019 International Conference on Unmanned Aircraft Systems (ICUAS

    Pseudo-Hamiltonian neural networks with state-dependent external forces

    Get PDF
    Hybrid machine learning based on Hamiltonian formulations has recently been successfully demonstrated for simple mechanical systems, both energy conserving and not energy conserving. We introduce a pseudo-Hamiltonian formulation that is a generalization of the Hamiltonian formulation via the port-Hamiltonian formulation, and show that pseudo-Hamiltonian neural network models can be used to learn external forces acting on a system. We argue that this property is particularly useful when the external forces are state dependent, in which case it is the pseudo-Hamiltonian structure that facilitates the separation of internal and external forces. Numerical results are provided for a forced and damped mass–spring system and a tank system of higher complexity, and a symmetric fourth-order integration scheme is introduced for improved training on sparse and noisy data.publishedVersio

    Port-Hamiltonian Neural Networks with State-Dependent Ports

    Get PDF
    Hybrid machine learning based on Hamiltonian formulations has recently been successfully demonstrated for simple mechanical systems, both energy conserving and not energy conserving. We show that port-Hamiltonian neural network models can be used to learn external forces acting on a system. We argue that this property is particularly useful when the external forces are state dependent, in which case it is the port-Hamiltonian structure that facilitates the separation of internal and external forces. Numerical results are provided for a forced and damped mass-spring system and a tank system of higher complexity, and a symmetric fourth-order integration scheme is introduced for improved training on sparse and noisy data.Comment: 21 pages, 12 figures; v3: restructured the paper for more clarity, major changes to the text, updated plot

    Behaviour and habitat use of first-time migrant Arctic charr: novel insights from a subarctic marine area

    Get PDF
    Anadromous Arctic charr Salvelinus alpinus is a cold-adapted salmonid that is vulnerable to climate warming and anthropogenic activities including salmon farming, hydropower regulation, and pollution, which poses a multiple-stressor scenario that influences or threatens populations. We studied the horizontal and vertical behaviour of Arctic charr tagged with acoustic transmitters (n = 45, mean fish length: 22 cm) in a pristine, subarctic marine area to provide insights into the behaviour of first-time migrants. Tagged fish spent up to 78 d at sea, with high marine survival (82% returned to their native watercourse). While at sea, they utilized mostly near-shore areas, up to 45 km away from their native river. Arctic charr showed large variation in migration distance (mean ± SD: 222 ± 174 km), and the migration distance increased with body size. Although the fish displayed a strong fidelity to surface waters (0-3 m), spatiotemporal variation in depth use was evident, with fish utilizing deeper depths during the day and in late July. These results represent baseline data on Arctic charr’s marine behaviour in a pristine fjord system and highlight the importance of near-shore surface water as feeding areas for first-time migrants. Furthermore, the observed dependency on coastal areas implies a vulnerability to increasing human-induced perturbations, on top of impacts by large-scale climate change in marine and freshwater habitats.publishedVersio

    Reinforcement Learning for Optimization of Nonlinear and Predictive Control

    No full text
    Autonomous systems extend upon human capabilities and can be equipped with superhuman attributes in terms of durability, strength, and perception to name a few, and can provide numerous benefits such as superior efficiency, accuracy and endurance, and the ability to explore dangerous environments. Delivering on this potential requires a control system that can skillfully operate the autonomous system to complete its objectives. A static control system must be carefully designed to handle any situation that might arise. This motivates the introduction of learning in the control system since a learning system can learn from its experiences to manage novel unexpected events and changes in its operating environment. Traditional formal control techniques are typically designed offline assuming exact knowledge of the dynamics of the system to be controlled. These knowledgebased approaches have the important benefit that the stability properties of the control algorithm can be analyzed and certified, such that one can have confidence in the control system’s ability to safely operate the controlled system. However, linear control techniques applied to nonlinear systems (which all real systems are to some extent) lead to increasingly conservative and therefore suboptimal control performance the more nonlinear the controlled system is. Nonlinear control techniques often have considerable online computational complexity, which makes them infeasible for systems with fast dynamics and for embedded control applications where computational power and energy are limited resources. Reinforcement learning is a framework for developing self-optimizing controllers, that learn to improve its operation through trial-and-error and adjusting its behaviour based on the observed outcomes of its actions. In general, reinforcement learning requires no knowledge about the dynamics of the controlled system, can learn to operate arbitrarily nonlinear systems, and its online operation can be designed to be highly computationally efficient. It is therefore a valuable tool for control systems where the dynamics are fast, nonlinear, or uncertain, and difficult to model. A central challenge of reinforcement learning control on the other hand is that its behaviour is complex and difficult to analyze, and it has no inherent support for specification of operating constraints. An approach to remedy these challenges for reinforcement learning control is to combine its learning capabilities with an existing trusted control technique. In Part I of this thesis, we employ reinforcement learning for optimization of the model predictive control (MPC) scheme, a powerful yet complex control technique. We propose the novel idea of optimizing its meta-parameters, that is, parameters affecting the structure of the control problem the MPC solves as opposed to internal parameters affecting the solution to a given problem. In particular, we optimize the meta-parameters of when to compute the MPC and with what prediction horizon, and show that by intelligently selecting the conditions under which it is computed, the control performance and computational complexity can be simultaneously improved. We subsequently present a framework in which these meta-parameters as well as any other internal parameter of the MPC can be jointly optimized with a configurable objective. Finally, Part I of the thesis also considers how an existing controller can be used to accelerate the learning process of a learning controller. Control of unmanned aerial vehicles (UAVs) is precisely such an embedded application with limited computational- and energy-resources, and moreover where the dynamics are highly nonlinear and affected by significant disturbances such as turbulence. In Part II of this thesis, we propose the novel idea of employing deep reinforcement learning (DRL) for low-level control of fixed-wing UAVs, a UAV-design that exhibit superior range and payload capacity compared to the popular multirotor drone design. We present a method capable of learning flightworthy DRL controllers with as little as 3 minutes of interaction with the controlled system, and demonstrate through field experiments with the real UAV that the DRL controller is competitive with the state-of-the-art existing autopilot, generating smooth responses in the controlled states and in the control signals to the actuators

    Google Earth, den universale GIS-klienten?

    No full text
    Google has released a virtual three-dimensional model of the earth through the product Google Earth. This product gives us easy access to huge amounts of high quality geodata free of charge. This thesis addresses the impact this product may have on geographical information systems (GIS) by making available a new platform for geographic information services. I discuss how Google Earth performs as a universal GIS client by exploring its possibilities and limitations. I do this by developing a framework to be used on third party servers offering geographical information services for the client. I discuss topics such as how to generate Google Earth data (KML documents) from a high level object model, how to manage symbols with three dimensional graphics, how to handle requests, authentication and sessions from Google Earth clients, how to limit the data to just include objects of interests with a sufficient detail level and how to present a graphical user interface for the user in order to get parameters back. In order to try the flexibility I implement several test cases using the framework for offering different services. This includes fleet management, presentation of health information and overview of national resources for emergency use. Although I discover some limitations, the main conclusion for this thesis work is that Google Earth may perform well as a universal geographic information system client. Google Earth differs from traditional GIS systems by letting the user access geodata from different services and gets everything presented at the same time on the same map. This makes the program to act similar as a web browser

    Deep Reinforcement Learning Attitude Control of Fixed Wing UAVs Using Proximal Policy Optimization

    Get PDF
    Postprint version of published articleContemporary autopilot systems for unmanned aerial vehicles (UAVs) are far more limited in their flight envelope as compared to experienced human pilots, thereby restricting the conditions UAVs can operate in and the types of missions they can accomplish autonomously. This paper proposes a deep reinforcement learning (DRL) controller to handle the nonlinear attitude control problem, enabling extended flight envelopes for fixed-wing UAVs. A proof-of-concept controller using the proximal policy optimization (PPO) algorithm is developed, and is shown to be capable of stabilizing a fixed-wing UAV from a large set of initial conditions to reference roll, pitch and airspeed values. The training process is outlined and key factors for its progression rate are considered, with the most important factor found to be limiting the number of variables in the observation vector, and including values for several previous time steps for these variables. The trained reinforcement learning (RL) controller is compared to a proportional-integral-derivative (PID) controller, and is found to converge in more cases than the PID controller, with comparable performance. Furthermore, the RL controller is shown to generalize well to unseen disturbances in the form of wind and turbulence, even in severe disturbance conditions.acceptedVersio
    corecore